personal identifier
Thunder-DeID: Accurate and Efficient De-identification Framework for Korean Court Judgments
Hahm, Sungeun, Kim, Heejin, Lee, Gyuseong, Park, Hyunji, Lee, Jaejin
To ensure a balance between open access to justice and personal data protection, the South Korean judiciary mandates the de-identification of court judgments before they can be publicly disclosed. However, the current de-identification process is inadequate for handling court judgments at scale while adhering to strict legal requirements. Additionally, the legal definitions and categorizations of personal identifiers are vague and not well-suited for technical solutions. To tackle these challenges, we propose a de-identification framework called Thunder-DeID, which aligns with relevant laws and practices. Specifically, we (i) construct and release the first Korean legal dataset containing annotated judgments along with corresponding lists of entity mentions, (ii) introduce a systematic categorization of Personally Identifiable Information (PII), and (iii) develop an end-to-end deep neural network (DNN)-based de-identification pipeline. Our experimental results demonstrate that our model achieves state-of-the-art performance in the de-identification of court judgments.
- Europe (1.00)
- Asia > South Korea (1.00)
- North America > United States (0.93)
- Transportation > Passenger (1.00)
- Transportation > Infrastructure & Services (1.00)
- Transportation > Ground > Road (1.00)
- (10 more...)
Using Text Injection to Improve Recognition of Personal Identifiers in Speech
Blau, Yochai, Agrawal, Rohan, Madmony, Lior, Wang, Gary, Rosenberg, Andrew, Chen, Zhehuai, Gekhman, Zorik, Beryozkin, Genady, Haghani, Parisa, Ramabhadran, Bhuvana
Accurate recognition of specific categories, such as persons' names, dates or other identifiers is critical in many Automatic Speech Recognition (ASR) applications. As these categories represent personal information, ethical use of this data including collection, transcription, training and evaluation demands special care. One way of ensuring the security and privacy of individuals is to redact or eliminate Personally Identifiable Information (PII) from collection altogether. However, this results in ASR models that tend to have lower recognition accuracy of these categories. We use text-injection to improve the recognition of PII categories by including fake textual substitutes of PII categories in the training data using a text injection method. We demonstrate substantial improvement to Recall of Names and Dates in medical notes while improving overall WER. For alphanumeric digit sequences we show improvements to Character Error Rate and Sentence Accuracy.
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (1.00)
Protecting Sensitive Data in Analytics: A Data Engineering Perspective
Our team has shared the most effective ways to keep data safe, including key techniques such as tokenisation, suppression and cryptographic encryption. Data-driven solutions help organisations make better decisions, improve efficiency, create better experiences for customers and ultimately bring in more revenue. But the growth of big data is outpacing the protection of such information. With the ever-increasing amount of data being collected, stored and processed, it is essential for data engineers to understand how best to handle personal information for analytics. Data engineers frequently spend their days striking a balance between two responsibilities: Harnessing large amounts of data involving sensitive/ personal data to innovate and drive change while also adhering to strict standards that govern how that data should be handled and used.
What is Data Anonymization?
Data anonymization is the process of mitigating direct and indirect privacy risks within data, such that there is a measurable way to ensure records cannot be attributed to a specific individual or entity. With an estimated 2.5 quintillion bytes of data being generated every day and an increasing reliance on data to power new applications, machine learning models and AI technologies, the importance of implementing effective anonymization techniques and removing any bottlenecks is crucial to accelerating future developments and innovations. This post is a general introduction to anonymization, and the tools and techniques for providing sufficient privacy protections, so that personally identifiable information (PII) is safe from exposure and exploitation. Data anonymization should be considered a continuous process; one that can require rapid iteration of applying various privacy engineering techniques and then measuring those privacy outcomes until a desired end state is reached. In the following sections, we'll dive deeper into our core tenets of the data anonymization process, and then walkthrough how you might apply them to a notional dataset.
- Europe (0.15)
- North America > United States > California (0.05)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
Automation of Data De-identification - John Snow Labs
With evermore personal data being produced and stored by organizations, data privacy is becoming an increasing priority. Businesses have access to a lot of sensitive information about their customers, service providers, and employees and are required to protect that data in order to minimize the risks of scams or fraud. De-identification is used to overcome data privacy challenges and keep information safe from unauthorized parties. This post explains what de-identification is, how it works and how natural language processing (NLP) is used to automate the process of removing sensitive data from datasets. De-identification is a technique used to remove any data that could identify a person from a dataset.
Genuine Personal Identifiers and Mutual Sureties for Sybil-Resilient Community Formation
Shahaf, Gal, Shapiro, Ehud, Talmon, Nimrod
While most of humanity is suddenly on the net, the value of this singularity is hampered by the lack of credible digital identities: Social networking, person-to-person transactions, democratic conduct, cooperation and philanthropy are all hampered by the profound presence of fake identities, as illustrated by Facebook's removal of 5.4Bn fake accounts since the beginning of 2019. Here, we introduce the fundamental notion of a \emph{genuine personal identifier}---a globally unique and singular identifier of a person---and present a foundation for a decentralized, grassroots, bottom-up process in which every human being may create, own, and protect the privacy of a genuine personal identifier. The solution employs mutual sureties among owners of personal identifiers, resulting in a mutual-surety graph reminiscent of a web-of-trust. Importantly, this approach is designed for a distributed realization, possibly using distributed ledger technology, and does not depend on the use or storage of biometric properties. For the solution to be complete, additional components are needed, notably a mechanism that encourages honest behavior and a sybil-resilient governance system.
- Asia > India (0.28)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Africa > Sierra Leone (0.04)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Services > e-Commerce Services (0.34)